Paraphrase type identification for plagiarism detection using contexts and word embeddings

نویسندگان

چکیده

Abstract Paraphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion identified some common strategies used plagiarists. However, similarity reports generated most plagiarism detection systems provide a score produce matching sections text with their possible sources. In this research we propose methods to identify two important paraphrase – synonymous substitution in paraphrased, plagiarised sentence pairs. We three staged approach that uses context pretrained embeddings for identifying reordering. Our indicates use Smith Waterman Algorithm Plagiarism Detection ConceptNet Numberbatch produces best performance terms $$\hbox {F}_1$$ F 1 scores. This can be complement currently available incorporating detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Cross-Lingual Plagiarism Using Simulated Word Embeddings

Cross-lingual plagiarism (CLP) occurs when texts written in one language are translated into a different language and used without acknowledging the original sources. One of the most common methods for detecting CLP requires online machine translators (such as Google or Microsoft translate) which are not always available, and given that plagiarism detection typically involves large document com...

متن کامل

Clickbait detection using word embeddings

Clickbait is a pejorative term describing web content that is aimed at generating online advertising revenue, especially at the expense of quality or accuracy, relying on sensationalist headlines or eyecatching thumbnail pictures to attract click-throughs and to encourage forwarding of the material over online social networks. We use distributed word representations of the words in the title as...

متن کامل

Methods for Detecting Paraphrase Plagiarism

Paraphrase plagiarism is one of the difficult challenges facing plagiarism detection systems. Paraphrasing occur when texts are lexically or syntactically altered to look different, but retain their original meaning. Most plagiarism detection systems (many of which are commercial based) are designed to detect word co-occurrences and light modifications, but are unable to detect severe semantic ...

متن کامل

Paraphrase Identification Using Weighted Dependencies and Word Semantics

We present in this article a novel approach to the task of paraphrase identification. The proposed approach quantifies both the similarity and dissimilarity between two sentences. The similarity and dissimilarity is assessed based on lexico-semantic information, i.e., word semantics, and syntactic information in the form of dependencies, which are explicit syntactic relations between words in a...

متن کامل

Different Contexts Lead to Different Word Embeddings

Recent work for learning word representations has applied successfully to many NLP applications, such as sentiment analysis and question answering. However, most of these models assume a single vector per word type without considering polysemy and homonymy. In this paper, we present an extension to the CBOW model which not only improves the quality of embeddings but also makes embeddings suitab...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal of educational technology in higher education

سال: 2021

ISSN: ['2365-9440']

DOI: https://doi.org/10.1186/s41239-021-00277-8